Enron Emails as Graph Data Corpus for Large-scale Graph Querying Experimentation

نویسندگان

  • Michal Laclavík
  • Martin Šeleng
  • Marek Ciglan
  • Ladislav Hluchý
چکیده

In this paper we describe Enron email corpus in graph/network data format. Nodes of the graph are emails connected with named entities (NE) extracted from text like people, email addresses, telephone numbers. Edges are links between NE representing concurrence in same email part, paragraph, sentence or composite NE. Enron Graph corpus contains a few millions of nodes and it is quite large corpus for experimenting with various graph querying techniques like graph traversing or spread of activation on graph. The idea is to make this data available for future experiments.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Authorship Attribution of E-Mail: Comparing Classifiers over a New Corpus for Evaluation

Until very recently, the email collections that have been available for research have been rather artificially created and consist of emails that contributors have chosen to make available. These collections serve very well for certain applications, but are certainly not representative of a person’s email habits; thus, they have not been realistic resources for testing automatic techniques for ...

متن کامل

A Query Based Approach for Mining Evolving Graphs

An evolving graph is a graph that can change over time. Such graphs can be applied in modelling a wide range of real-world phenomena, like computer networks, social networks and protein interaction networks. This paper addresses the novel problem of querying evolving graphs using spatio-temporal patterns. In particular, we focus on answering selection queries, which can discover evolving subgra...

متن کامل

Vertex Nomination via Content and Context

If I know of a few persons of interest, how can a combination of human language technology and graph theory help me find other people similarly interesting? If I know of a few people committing a crime, how can I determine their co-conspirators? Given a set of actors deemed interesting, we seek other actors who are similarly interesting. We use a collection of communications encoded as an attri...

متن کامل

Quantifying and Comparing Centrality Measures for Network Individuals as Applied to the Enron Corpus

The ever increasing body of social networks creates an opportunity for extensive network analysis and investigations of communications, cliques, and network contributions. In this study, we focus our attention on the Enron email corpus and the corresponding network of employees, attempting to gather information from the email communications. Methods of data reduction on the email corpus were us...

متن کامل

Introducing the Enron Corpus

A large set of email messages, the Enron corpus, was made public during the legal investigation concerning the Enron corporation. This dataset, along with a thorough explanation of its origin, is available at http://www-2.cs.cmu.edu/~enron/. This paper provides a brief introduction and analysis of the dataset. The raw Enron corpus contains 619,446 messages belonging to 158 users. We cleaned the...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011